Refactor the baseline Validation Pipeline to align with GSoC Architecture#4017
Refactor the baseline Validation Pipeline to align with GSoC Architecture#4017ayushman1210 wants to merge 16 commits into
Conversation
…rics/plot_time_series, update tests
Please convert this to a dplyr or base R approach. data.table is a good package that lots of people use happily, but it is not widely used in PEcAn and its syntax is confusing to folks who don't know it already. |
divine7022
left a comment
There was a problem hiding this comment.
overall it's in good shape, I dropped inline for thing I'd want a look at before merge
| numer <- sum((dat$obs - mean(dat$obs, na.rm=T)) * (dat$model - mean(dat$model, na.rm=T)), na.rm=T) | ||
| denom <- sqrt(sum((dat$obs - mean(dat$obs, na.rm=T))^2, na.rm=T)) * sqrt(sum((dat$model - mean(dat$model, na.rm=T))^2, na.rm=T)) | ||
| (numer / denom)^2 | ||
| } |
There was a problem hiding this comment.
aligned dataframe returns c("time", "model", "obs") but the convention in this module is c("model", "obvs", "time") and read by every metric. any reason align_by_time doesn't follow it ?
| res <- run_benchmark( | ||
| model_path = "inst/testdata/sample_model.csv", | ||
| obs_path = "inst/testdata/sample_obs.csv" | ||
| ) |
There was a problem hiding this comment.
ur using model_path = ..., obs_path = ... but signature is run_benchmark(model_df, obs_df, ...) looks like README is from an earlier API where function took paths; folks using this ex would hit an error. probably want to update this and the parameter list ryt below it and pass as df
| #' Load and standardize arbitrary tabular data using a YAML mapping configuration | ||
| #' | ||
| #' @param data.path character, file path to the tabular data (e.g. .csv) | ||
| #' @param mapping.path character, file path to the YAML mapping configuration | ||
| #' @return A standardized data frame with column names mapped to PEcAn standard vocabulary | ||
| #' @export |
There was a problem hiding this comment.
- i couldn't see load_and_map_data in NAMESPACE either a man page, looks like documen() hasn't re run
- wondering about registry format as we were discussing abt frictionless data / data package.json, was the simpler yaml approach an intentional phase 2 simplification with frictionless coming later or did direction shift ?
either's fine, just want to make sure we're on same page, I'd defer to @dlebauer probably has perspective here
| bm_validate <- function(model_df, obs_df) { | ||
| for (df in list(model_df, obs_df)) { | ||
| if (!inherits(df$time, "POSIXct")) | ||
| stop("Column 'time' must be POSIXct, got: ", class(df$time)) | ||
| if (!is.numeric(df$value)) | ||
| stop("Column 'value' must be numeric, got: ", class(df$value)) | ||
| } | ||
| invisible(TRUE) | ||
| } |
There was a problem hiding this comment.
folks who passes df with column named timestamp or date instead of time hits df$time returning NULL, so error reads reads which is technically correct but doesn't tell user actual problem is missing column
worth checking column existence before type!
| stop("Column 'time' must be POSIXct, got: ", class(df$time)) | ||
| if (!is.numeric(df$value)) | ||
| stop("Column 'value' must be numeric, got: ", class(df$value)) |
There was a problem hiding this comment.
please use PEcAn.logger consistently, so messages land in the workflow log consistently
| compute_metrics <- function(aligned, metrics = c("RMSE", "MAE", "R2")) { | ||
| # Future-proofing: Functions in the registry now accept the full aligned dataframe | ||
| # This aligns with the decoupled metric architecture introduced in PR #3888 | ||
| METRIC_REGISTRY <- list( | ||
| RMSE = function(dat) sqrt(mean((dat$model - dat$obs)^2, na.rm = TRUE)), | ||
| MAE = function(dat) mean(abs(dat$model - dat$obs), na.rm = TRUE), | ||
| R2 = function(dat) { |
There was a problem hiding this comment.
registry is in the ryt shape. but one thing, it's closed inside the fun body, so adding new metric means editing this function rather than registering one externally. worth adding to a package level object that callers can extend
bigger question, what's the plan for prediction interval coverage and PMU ? together they're CAR SEP regulatory pass criterion: bias <= PMU AND >=90% coverage. neither metric is in PEcAn.benchmark today, and neither's in this PR. they need different inputs than current dat$model / dat$obs (PMU needs per treatment pair SE and replicate counts; coverage needs per prediction quantiles)
worth thinking about whether the input should grow to support them now rather than reworking later
NSE / MEF is a third worth adding, not regulatory required, but standard in the broader literature for cross paper comparability.
cheap if you're already adding metrics
| time = as.POSIXct(seq(0, 3600*3, by = 3600), origin = "1970-01-01", tz = "UTC"), | ||
| value = c(1.1, 1.9, 3.2, 3.9) | ||
| ) | ||
|
|
There was a problem hiding this comment.
couple of gaps here. none of the tests exercise R2 path, so bug I flagged regarding that ships green.
so worth tightening at least the R2 path, the alignment value assertion, and tolerance dropping case before this lands
| obs = obs_df$value[nearest_idx][valid] | ||
| ) | ||
|
|
||
| return(aligned) |
There was a problem hiding this comment.
tolerance filter is silent here. model and obs typically run at different timesteps in validation work, so default tolerance can drop most model points without user knowing.
add one line info log of kept vs dropped would make this visible
| plot_time_series <- function(aligned) { | ||
| ggplot2::ggplot(aligned, ggplot2::aes(x = .data$time)) + | ||
| ggplot2::geom_line(ggplot2::aes(y = .data$model, color = "Model")) + | ||
| ggplot2::geom_line(ggplot2::aes(y = .data$obs, color = "Obs")) + | ||
| ggplot2::labs(color = "", y = "value", title = "Model vs Observations") + | ||
| ggplot2::theme_bw() |
There was a problem hiding this comment.
observations are typically sparse relative to model output in validation workflows. plotting both as geom_line connects obs points with segments that imply information they don't have. convention is points for obs, line for model
This issue tracks the follow-up changes required for the dataframe-first validation framework introduced in PR #3892 aligning it with the architecture outlined in my GSoC 2026 workplan
Planned Changes
run_benchmark()entry point into a flexible 4-stage pipeline:Validate -> Align -> Compute -> Plot.Context
PR #3892 established the baseline dataframe-first concept. The changes tracked in this issue will evolve it into the generalized, high-performance toolkit planned for Phase 2 of my GSoC project.